Introduction to C
History
The C language is linked to the design of the UNIX system by Bell labs in the 1970s. Its development was influenced by two languages:
- BCPL, developed in 1966 by Martin Richards
- B, developed in 1970 at Bell labs
The first version of the C compiler was written in 1972 by Dennis Ritchie. From there, the language grew in popularity along the UNIX systems and numerous versions of the C language were created:
- In 1978: K&R C, an informal specification based on the book The C Programming Language written by Brian Kernighan and Dennis Ritchie.
- In 1989: ANSI C or C89, the first official C standard.
- In 1990: ISO C or C90, the same language as C89 but published as an ISO standard.
- In 1999: C99, major extensions to the standard C language.
- In 2011: C11, minor new language features.
- In 2017: C17, minor corrections of C11 without adding new features.
- In 2023: C23, minor new language features, modernizations and cleanup.
A compiler is the tool used to translate source code into an executable program. You will study compiler details and inner-workings at length during your ING1.
At EPITA, we will be using the C99 standard.
The C programming language has been constantly ranked among the most popular programming languages since the 1980s according to the TIOBE index. Because of its dense history and low-level design, C is best known to be very portable, extremely efficient and a mature language. In the industry, it is widely used for: operating systems and kernel development, compilers and interpreters design, libraries, embedded systems, database management systems, ...
Syntax reminders
Comments
In C there are two types of comments: single-line comments and multi-line comments. See the examples for the syntax of each type of comment.
// I am a single-line comment
/* I am a single-line comment in the multi-line style */
/*
** I am a multi-line comment authorized by the EPITA standard
*/
/*
I am also a multi-line comment, but not authorized by the coding style
*/
Variables
A variable consists of:
A type which is one of built in C types or a user defined.
They have an identifier (a name) that must respect the following naming conventions:
- Start with a letter or an underscore ('
_
'). - Consist of a sequence of letters, numbers or underscores.
- Be different from C keywords.
- Start with a letter or an underscore ('
Starting with '_
' is forbidden by the coding style.
- Possibly a value.
int i;
int j = 3;
char c = 'a';
float f = 42.42;
You can then use declared variables in the program by using their identifiers.
int a = 1;
int b = 41;
int sum = a + b; // sum == 42
Predefined types
Basic data types of C
void
: a variable cannot have this type, which means "having no type", this type is used for procedures (see below).char
: a character (which is actually a number) coded with a single byte.int
: an integer which memory space depends on the architecture of the machine (2 bytes on 16-bit architectures, 4 on 32 and 64-bit architectures).float
: a floating point number with simple precision (4 bytes).double
: a floating point number with double precision (8 bytes).
It is possible to apply a number of qualifiers to these data types, the followings apply to integers:
Name | Bytes | Possible values ($-2^{n-1}$ to $2^{n-1}-1$) |
---|---|---|
short (int ) | 2 | -32 768 to 32 767 |
int | 2 or 4 | $-2^{15}$ to $2^{15}-1$ or $-2^{31}$ to $2^{31}-1$ |
long (int ) | 4 or 8 | $-2^{31}$ to $2^{31}-1$ or $-2^{63}$ to $2^{63}-1$ |
long long (int ) | 8 | $-2^{63}$ to $2^{63}-1$ |
Note that the long qualifier depends on your architecture: on 32-bit architectures, it will be 4 bytes long, and on 64-bit architectures, it will be 8 bytes long.
A bit can have two values, 0 or 1. A byte is 8 bits long, thus having values from 0 to 255 (11111111 in binary).
For example:
short int shortvar;
long int counter;
In that case, int
is optional.
By default, data types are signed, which means that variables with
these types can take negative or positive values. It is also possible to
use unsigned types thanks to the keyword unsigned
(and you specify
that it is signed with the signed
keyword, but integers are signed
by default, so this keyword is rarely used).
signed
andunsigned
qualifiers only apply to integer types (char
andint
).char
type is by default either signed or unsigned: it depends on your compiler.
Booleans
A boolean is a type that can be evaluated as either true
or false
. They
are used in control structures.
In the beginning, there was no boolean type in C and integer types were used instead:
- 0 stated as false.
- Any other value stated as true.
C99 standard introduced _Bool
type that can contain the values 0 and 1. The
header stdbool.h
was also added: it defines the bool
type, a shortcut
for _Bool
and the values true
and false
. You will learn more about
headers later.
Typecast (implicit type conversion)
When an expression involves data of different but compatible types, one can wonder about the result's type. The C compiler automatically performs conversion of "inferior" types to the biggest type used in the expression.
int i = 42;
int j = 4;
float k = i / j; // k equals 10.0
The type of i
and j
variables is int
, so the result of the
division will have int
type and will be 10. However, we want to have a
float
type as a result and so we use typecast:
int i = 42;
int j = 4;
float t = i;
float k = t / j; // k equals 10.5
t
being of float
type, the result's type becomes implicitly
float
and the value 10.5 is stored in k
.
Operators
Binary operators
Arithmetic operators
For arithmetic operations, the usual operators are available:
Operation | Operator |
---|---|
addition | + |
subtraction | - |
multiplication | * |
division | / |
remainder | % |
The result of a division between two integers is truncated.
Example:
float i = 5 / 2; // i == 2.0
float j = 5. / 2.; // j == 2.5, note that 5. is equivalent to 5.0
Comparison operators
These operators return a boolean result that is either true (any value different from 0) or false (the value 0) depending on whether equalities or inequalities are, or are not, checked:
Operation | Operator |
---|---|
equality | == |
difference | != |
superior | > |
superior or equal | >= |
inferior | < |
inferior or equal | <= |
Logical operators
Logical OR
||
:condition1 || condition2 || ... || conditionN
The previous expression will be true if at least one of the conditions is true, false otherwise.
Logical AND
&&
:condition1 && condition2 && ... && conditionN
The previous expression will be true if all conditions are true, false otherwise.
The execution of conditions is left to right. The following
conditions are only evaluated when necessary (laziness). For example,
with two conditions separated by &&
, if the first one returns
false, then the second one will not be evaluated (because the result
is already known: false). The same goes for a true expression on the
left of a ||
, the result is obviously true.
Example:
int a = 42;
int b = 0;
(a == 1) && (b = 42);
// b equals 0, and not 42, because 'b = 42' has not been evaluated
Assignment Operators
- Classical assignment:
=
. This operator allows to assign a value to a variable. The value returned byvar = 4 + 2;
is 6 (the assigned value). This property allows you to chain assignments:
int i, j, k;
i = j = k = 42; // i, j and k equal 42
Note that the coding style requires one declaration by line.
+=
is a shortcut for a = a + b
, same goes for -=
, *=
,
/=
and %=
.
int a = 5;
int b = 33;
a += b; // a == 38
int c += a; // does not compile because ``c`` does not exist
Unary operators
Negation
The operator -
, is used to negate a numeric value. It is the same as a
multiplication by -1
.
int i = 2;
int j = -i; // j == -2
Increment/Decrement
In C you can use the ++
and the --
operators to respectively increment
and decrement by 1 a variable.
When the ++
operator (or --
) is placed on the left hand side, it is
called pre-increment. It means that the variable will be first incremented and
then used in the expression.
On the other hand, when the ++
operator (or --
) is placed on
the right hand side, it is called post-increment. The variable is first used
in the expression and then incremented.
int i = 2;
int j;
int k;
j = i++; // j == 2 and i == 3
k = j + ++i; // k == 6 and i == 4
Not
The !
operator is used with a boolean condition. Its effect is to
reverse the value of the condition:
- if
CONDITION
is true, then!CONDITION
is false; - if
CONDITION
is false, then!CONDITION
is true.
Priorities
The following operators are given from highest to lowest priority. Their associativities are also given: left or right. This is not a list of ALL operators in C, rather the most common ones.
Category | Operators | Associativity |
---|---|---|
parentheses | () | Left |
unary | + - ++ -- ! | Right |
arithmetic | * / % | Left |
arithmetic | + - | Left |
comparisons | < <= > >= | Left |
comparisons | == != | Left |
logical | && | Left |
logical | || | Left |
ternary | ?: | Right |
assignment | = += -= *= /= %= | Right |
In programming languages, associativity is to be understood as operator associativity. When two operators are of the same precedence, in order to determine how to resolve the order of execution, we look at their respective associativity.
Left associativity indicates that operations are resolved left to right.
Right associativity indicates that operations are resolved right to left.
Example
int a = 1;
int result = ! -- a == 3 / 3;
The following rules will be applied in this order to resolve priority issues:
- The unary operators
!
and--
are the ones with the highest priority. As both of them have right-to-left priority,--
will be solved before!
. - The arithmetic division,
/
, is now the operator with highest priority, so the next operation will be3 / 3
. - Finally the
==
, with the lowest priority, will be executed.
We could rewrite this whole operation as:
int result = (!(--a)) == (3 / 3);
Associativity is not always obvious: do not hesitate to add parentheses, even if they are not required, to make some operator priorities explicit and ensure the code is easily readable.
ASCII
The American Standard Code for Information Interchange (abbreviated ASCII) is one of the most widely used encoding standards in the world. It was developed in the 1960s and maps 128 characters based on the English alphabet to numerical values.
You can see the ASCII table by typing man ascii
in your terminal.
You should really take a look at the ASCII table and notice a few things:
- Characters are sorted logically, 'a' to 'z' are contiguous, as well as 'A' to 'Z' and '0' to '9'.
- The character '0' does not have the value 0.
- Some characters cannot be printed (for example
ESC
orDEL
).
In C, a variable of type char
can at least take values from 0 to 127,
where each value in this range corresponds to a character following the ASCII
table. The value of a char
variable being a number, numerical operations
can be performed on this variable.
#include <stdio.h>
int main(void)
{
char c = 'A';
c += 32;
if (c >= 97 && c <= 122)
puts("'c' has become a lowercase character!");
return 0;
}
However, this writing is not practical at all as it is hard to read. We will prefer the following:
#include <stdio.h>
int main(void)
{
char c = 'A';
c += 'a' - 'A';
if (c >= 'a' && c <= 'z')
puts("'c' has become a lowercase character!");
return 0;
}
You might wonder what the ASCII value of such or such letter is. Truth is that does not matter and is even irrelevant and that you should always use the character itself when performing operations on characters.
Control structures
Instructions and blocks
A block regroups many instructions or expressions. It creates a scope where
variables used in expressions can "live". It is specified by specific
delimiters: {
and }
. Functions are a special kind of blocks. Blocks may
be nested and empty.
If ... else
Conditions allow the program to execute different instructions based on the result of an expression.
if (expression)
{
instr1;
}
else
{
instr2;
}
For example:
if (a > b)
a = b;
else
a = 0;
You can see that there are no braces here, if your block has only one instruction, it is allowed to omit braces.
Ternary operator
This operator allows to make a test with a return value. It is a compact
version of if
.
condition ? exp1 : exp2
It reads as follow:
"if" condition "then" exp1 "else" exp2
Example:
int i = 42;
int j = (i == 42) ? 43 : 42; // j equals 43
While
A loop repeats its instructions while the condition is met.
while (condition)
{
instr;
}
Braces are mandatory only if instr
is made of several instructions.
Example:
int i = 0;
while (i < 100)
{
i++;
}
Do ... while
The condition is checked only after the first run of the loop. Hence,
instr
is always executed at least once.
do {
instr;
} while (condition);
Example:
int i = 0;
do {
i++;
} while (i < 100);
For
Prefer the more compact for
loop syntax when you need to repeat the same
instructions a known amount of times.
for (assignation; condition; increment)
{
instr;
}
Example:
for (int i = 0; i < 10; i++)
{
// do something 10 times
}
Break, continue
break
: exits the current loop.continue
: skips the current iteration of a loop and goes directly to the next iteration.
Example:
for (int i = 0; i < 10; i++)
{
if (i == 2 || i == 4)
continue;
else if (i == 6)
break;
puts("I am looping!");
}
// The text "I am looping!" will only be printed 4 times.
Switch
The switch
statement allows to execute instructions depending on the
evaluation of an expression. It is more elegant than a series of if
...
else
when dealing with a large amount of possible values for one expression.
switch (expression)
{
case value:
instr1;
break;
/* ... */
default:
instrn;
}
Detail:
value
is a numerical constant or an enumeration value.expression
must have integer or enumeration type.
It is important to put a break
at the end of all cases, else the
code of the other instructions will also be executed until the first
break
. The default
case is optional. It is used to perform an
action if none of the previous values match.
Example:
switch (a)
{
case 1:
b++;
break;
case 2:
b--;
break;
default:
b = 0;
};
Functions
Definition
A function can be defined as a reusable and customizable piece of source
code, that may return a result. In C, there is barely any difference between
functions and procedures. Procedures can be seen as functions that do
not have a return value (void
).
Use
A function is made of a prototype and a body.
Prototypes follow this syntax:
type my_func(type1 var1, ...);
type
is the return type of the function (void in case of a procedure).my_func
is the name of the function (or symbol) and follows the same rules as variables' name.(type1 var1, ...)
is the list of parameters passed to the function.
If the function has no parameter, you have to put the void
keyword
instead of the parameters list:
type my_func2(void);
Definition of the body:
type my_func(type1 var1, type2 var2...)
{
/* code ... */
return val;
}
The execution of the return
instruction stops the execution of the
function. If the function's return type is not void
, return
is mandatory, otherwise it will cause undefined behaviors. If the return type is
void
and that return
is present, its only use is to end the function's
execution (return;
).
When a function has no parameter, forgetting the void
keyword can lead to
bugs.
Notice the difference between type my_func(void)
and type my_func()
:
The
type my_func(void)
syntax indicates that the function is taking no arguments.The
type my_func()
means that the function is taking an unspecified number of arguments (zero or more). You must avoid using this syntax.
When a function takes arguments, declare them; if it takes no arguments, use
void
.
Here is an example showing the risk of forgetting the void
keyword.
int foo()
{
if (foo(42))
return 42;
else
return foo(0);
}
If you test this code, you will realize that it compiles and runs causing
undefined behavior. However, if you use int foo(void)
it will generate
a compilation error.
Function call
In order to use a function, you need to call it, using this syntax:
my_fct(arg1, ...)
Arguments can either be variables or literal values.
Example:
int sum(int a, int b)
{
return a + b;
}
int a = 43;
int c = sum(a, 5);
If you want to call a function that does not take any argument, just leave the parentheses empty.
Arguments of a function are always passed by copy, which implies that their modification will not have an impact outside the function.
#include <stdio.h>
void modif(int i)
{
i = 0;
}
int main(void)
{
int i;
i = 42;
modif(i);
if (i == 42)
puts("Not modified");
else
puts("Modified");
return 0;
}
The previous example displays "Not modified"
.
Recursion
It is possible for a function to be recursive. The following example returns
the sum of numbers from 0
to i
.
int recurse(int i)
{
if (i)
return i + recurse(i - 1);
return 0;
}
Forward declaration
Sometimes, it is necessary to use a function before its definition (before its code). In this case, it is enough to write the function's prototype above the location where we want to make the function call, outside of any block. This is the same as declaring the function (to declare that the function exists) without defining it (implementing its body). Hence, the compiler will know that the function exists but that its implementation will be given later.
Example (note the ;
at the end of the prototype):
int my_fct(int arg1, float arg2);
int my_fct2(int arg1)
{
return my_fct(arg1, 0.3);
}
int my_fct(int arg1, float arg2)
{
// returns something
}
Without the forward declaration, the compiler would tell you it does not know
the function my_fct
.